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Abstract 

Background: Proteomes of thermophilic prokaryotes have been instrumental in structural biology and successfully 
exploited in biotechnology, however many proteins required for eukaryotic cell function are absent from bacteria or 
archaea. With Chaetomium thermophilum, Thielavia terrestris and Thielavia heterothallica three genome sequences of 
thermophilic eukaryotes have been published. 

Results: Studying the genomes and proteomes of these thermophilic fungi, we found common strategies of 
thermal adaptation across the different kingdoms of Life, including amino acid biases and a reduced genome size. 
A phylogenetics-guided comparison of thermophilic proteomes with those of other, mesophilic Sordariomycetes 
revealed consistent amino acid substitutions associated to thermophily that were also present in an independent 
lineage of thermophilic fungi. The most consistent pattern is the substitution of lysine by arginine, which we could 
find in almost all lineages but has not been extensively used in protein stability engineering. By exploiting 
mutational paths towards the thermophiles, we could predict particular amino acid residues in individual proteins 
that contribute to thermostability and validated some of them experimentally. By determining the 
three-dimensional structure of an exemplar protein from C. thermophilum (Arx1), we could also characterise the 
molecular consequences of some of these mutations. 

Conclusions: The comparative analysis of these three genomes not only enhances our understanding of the 
evolution of thermophily, but also provides new ways to engineer protein stability. 
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Background 

Proteins from thermophilic organisms are not only stable 
at higher temperatures, but are also generally more stable 
than their mesophilic counterparts. Therefore they are sci- 
entifically valuable, e.g. for biochemical and structural stud- 
ies, and have multiple applications in industry [1]. However, 
many proteins exclusively occur in eukaryotes, and only a 
few of the latter are thermophilic (defined as having an op- 
timal growth temperature [OGT] above 50°C; [2]. Recently, 
the first eukaryotic thermophilic genome, Chaetomium 
thermophilum, gave first insights into the potential for 
structural biology [3]. Now with two more genomes, Thie- 
lavia terrestris and Thielavia heterothallica [4] being 



* Correspondence: ed.hurt@bzh.uni-heidelberg.de; bork@embl.de 
2 Biochemie-Zentrum der Universitat Heidelberg, Im Neuenheimer Feld 328, 
Heidelberg D-69120, Germany 

Full list of author information is available at the end of the article 



published, comparative analysis of their thermophilic na- 
ture can be performed. 

Thielavia terrestris and Thielavia heterothallica (ana- 
morph Myceliophtora thermophila) are filamentous fungi 
of the class Sordariomycetes [4] which can be found in 
unnatural' habitats like compost. Their natural habitat 
seems to be in soils such as in semi-arid grasslands in New 
Mexico [5]. They are common in multiple microhabitats 
in this region, where high summer temperatures in combi- 
nation with episodes of substantial precipitation provide 
favourable conditions [5]. Chaetomium thermophilum is a 
widely distributed soil-inhabiting fungus and a thermophile 
in accordance with its lifestyle in self-heating composting 
plant material [6]. It can also be found in composting urban 
solid waste [7,8] and wood-chip piles [9,10]. C. thermophilum 
is a member of the large genus of Chaetomium, also within 
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Figure 1 (See legend on next page.) 
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(See figure on previous page.) 

Figure 1 Evolution of thermophily in Sordariomycetes. A) Maximum likelihood tree of Sordariomycetes based on 40 marker genes. Numbers 
on branches indicate bootstrap values. The mesophilic fungus Choetomium globosum clusters within the thermophiles with 100% bootstrap 
support. B) Amino acid frequencies in single copy proteins and reconstructed ancestral proteins were normalized for average and standard 
deviation. Coloured triangles are according to the colours of the branches in A: blue Cgl; dark green Tht; light green ancestor of Cgl and Tht; 
yellow Tht; orange ancestor of Tte and Cgl; purple Cth; red ancestor of Cth and Cgl. Black stars indicate significant differences between the 
thermophiles and the mesohpiles, purple stars between C. thermophilum and the mesophiles C) Amino acid frequencies of single copy proteins 
within the sub Class of Eurotiomycetidae, containing two sequenced thermophiles; Thermomyces lanuginosis and Talaromyces thermophilus. Black 
stars indicate significant differences between the thermophiles and the mesohpiles. 



the Sordariomycetes, that are found in soil, air, and plant 
debris [11]. Close relatives of these thermophilic fungi 
are the mesophilic mould fungus Chaetomium globosum 
(OGT 24°C), a frequent indoor contaminant that produces 
mycotoxins and acts as an allergen [11], and Neurospora 
crassa, another mesophilic filamentous fungus of which 
the genome has been published [12]. 

Due to their thermostable nature, proteins from thermo- 
philic fungi have recently gained considerable attention in 
industry and structural biology. Several crystal struc- 
tures of proteins from these thermophilic fungi have 
been determined such as those of two beta 1,4-galactanases 
from T.heterothallica [13], a glycoside hydrolase from 
T. terrestris [14], and Get3, Get4 and beta 1,4-xylanase 
from C. thermophilum [15-17]. The paper-industry uti- 
lizes members of the beta 1,4-xylanase family for bio- 
bleaching of kraft-pulp [18,19]. The biotechnological 
potential of C. thermophilum is also illustrated by the puri- 
fication and characterization of its thermostable superoxide 
dismutase (SOD) [20], an enzyme which is utilized in cos- 
metic products to reduce free radical damage to the skin. 
Furthermore, the genomes of C. thermophilum, T. terrestris 
and T. heterothallica provide a source of thermostable cel- 
lulolytic enzymes, such as the glycoside hydrolases that can 
be used in the production of third-generation biofuels [14]. 

Here, we identify commonalities and differences of 
thermophilic adaptation between eukaryotes and prokar- 
yotes and exploit the close relationship of the thermophilic 
to mesophilic fungi to gain detailed insight into the molecu- 
lar evolution of thermophily. By comparing the genomes of 
thermophilic fungi to each other and to mesophilic relatives 
we can clarify the evolutionary trajectory that has been 
obscured by inconsistent naming conventions [4] and de- 
termine whether there are independent events of gain of 
thermophily in these fungi. We further use the observed 
adaptation biases to predict mutations that can increase the 
thermostability of proteins and verify them experimentally. 

Results and discussion 

Taxonomic position of thermophilic fungi within 
Chaetomiaceae 

To determine the phylogenetic relationships between 
thermophilic and mesophilic fungi of the Sordariomycetes, 



we searched for the presence of 40 phylogenetic marker 
genes [21] in published and unpublished genomes of 
this clade using Hidden Markov Models (HMMs; see 
Materials and Methods), and used bootstrapping and 
Maximum Likelihood to calculate a phylogenetic tree 
(Figure 1A). Despite the different naming, the three 
thermophilic species closely group together, implying that 
the most parsimonious scenario is a single invention of 
thermophily. However, Chaetomium globosum, the closest 
mesophilic neighbour of these three thermophilic species 
is monophyletic within the thermophiles with 97% boot- 
strap support and most likely lost thermophily. As this 
was surprising, we also generated phylogenetic trees using 
2,064 universal single copy orthologs established specifi- 
cally for the Sordariomycetes using the eggNOG pipeline 
[22]. We indeed could confirm the taxonomic positions 
implying loss of thermophily (Additional file 1: Figure SI). 
Thus, by studying this lineage we can gain insight both in 
the gain and loss of thermophily. 

Genome reduction in thermophiles 

The genomes of C. thermophilum (Cth), T. terrestris (Tte) 
and T. heterothallica (Tht) are significantly smaller than 
their close mesophilic relatives such as Chaetomium 
globosum (Cgl) and Neurospora crassa (Ncr). In agree- 
ment with previous studies of prokaryotic thermophiles, 
the genome size reduction is due mainly to fewer protein 
coding genes (Cth 7,267; Tte 9,813; Tht 9,110 vs Cgl 
11,124 and Ncr 10,620), but also to shorter introns and 
shorter intergenic regions (Additional file 1: Figure S2) 
and [4]. Since C. globosum is derived from the ancestor of 
these three species, there are two possibilities. Either, this 
ancestor had a small genome and C. globosum has gained 
genes by duplications or horizontal transfers or the three 
thermophiles have independently lost genes in a parallel 
adaptation process. Although larger genomes in the out- 
groups makes a loss and gain scenario more likely, we 
investigated all orthologous groups from the complete ge- 
nomes of 20 members of the Sordariomycetes (sorNOGs) 
to clarify the gene content evolution of eukaryotic thermo- 
philes. Firstly, we analysed the phylogenetic presence/ 
absence patterns of these sorNOGs. In total, 4,542 protein 
coding genes are present in equal copy numbers in each of 
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the four species Cth, Tte, Tht and CgL Present in one copy 
but absent from either of the four are 330 (Cgl), 125 (Tte), 
130 (Tht) and 440 (Cth) orthologs, meaning that lineage 
specific loss alone does not account for the differences in 
genome size. C. globosum specific duplications are respon- 
sible for ca. 150 extra genes. It must be noted that some 
lineage specific losses may also be accounted for by differ- 
ence in genome quality, but the tendencies will remain. 

On the other hand, there are 845 orthologous groups 
covering 1,004 genes of C. globosum that are absent in 
all three others. These numbers are 181 (190), 325 (353) 
and 543 (579) orthologous groups (genes) for Cth, Tte 
and Tht. The difference in genome size can thus partly 
be assigned to these orthologous groups. A large num- 
ber of these are related to transposable elements, in- 
cluding 30 transposases, 74 reverse transcriptases, 30 
DNA helicases. The lack of these elements in the 
thermophilic fungi may indicate that transposition is 
unfavourable at higher temperatures. 

Oxygenases and enzymes hydrolyzing complex sugars 
are in particular frequently lost in the thermophiles. This 
does not always mean that metabolic capabilities are com- 
pletely absent; often multigene families in N. crassa and 
C. globosum have only one counterpart in C. thermophi- 
lum, but also non-homologous isoforms are reduced to 
one enzyme, implying a reduction in robustness. Proteins 
that are completely missing in C. thermophilum but not in 
the two Thielavias include WC1, WC2 and FRQ which 
are involved in the regulation of the circadian clock 
[23,24]. We hypothesize that due to the localization far in- 
side the compost away from light (implied by the high 
temperature optimum) the day-night rhythm does not 
play a role for C. thermophilum. 

There are no major gene family expansions in the ther- 
mophiles compared to their relatives, only a few ortholo- 
gous groups have been slightly expanded against the 
reductionist trend. The majority of them are uncharacter- 
ized, but some indicate life style adaptation such as a cello- 
biose dehydrogenase of which C. thermophilum has three 
copies and C. globosum and N. crassa only two, reflecting 
an increased wood degradation capacity. T. terrestris has 
five copies of a S-adenosyl-L-methionine (SAM) dependent 
methyltransferase that is likely to employ arsenite as sub- 
strate where its relatives have only one or two. The largest 
lineage specific expansion in T. heterothallica is an ortholo- 
gous group with three copies of a scytalone dehydratase 
involved in fungal melanin biosynthesis. Melanin provides 
resistance to UV radiation, drought and high temperatures 
[25] and thus this expansion likely represents a thermo- 
philic adaptation. The lack of major expansions suggests 
that the metabolisms of the thermophilic fungi have not 
undergone major niche adaptations requiring additional 
functionality, and that the dominating adaptation was in- 
deed the one to higher temperatures. 



Convergent evolution of thermophily across all domains 
of life 

It has been previously shown that the amino acid fre- 
quencies vary with the OGT, specifically the summed 
frequency of the amino acids IVYWREL shows the high- 
est correlation with OGT in both bacteria and archaea 
[26]. In these domains of life, the ancestor was likely a 
thermophile and adaptation happened to colder environ- 
ments [21]. 

We therefore investigated whether the molecular princi- 
ples of thermostability in fungal proteins are similar. In 
alignments of the 2,064 single copy orthologs universal in 
Sordariomycetes (see Methods and Table 1 for species 
list), we find that the total frequency of IVYWREL amino 
acids as in thermophilic archaea and bacteria is signifi- 
cantly higher in C. thermophilum compared to the other 
Sordariomycetes but not in T. heterothallica and T. terres- 
tris (P-value < E" 16 ). This is explained mainly by the ex- 
tremely high frequencies of isoleucines, tryptophans and 
tyrosines in C. thermophilum (Figure IB). Addition of 
these large hydrophobic amino acids is likely to play a role 
in filling the hydrophobic cores of proteins (e.g. [27] and 
below). Only part of this signal, the increased levels of 
arginine and tryptophane are present in all three thermo- 
philes. Specific to the two Thielavias is an enrichment in 
alanine. Furthermore, consistent differences between the 
three thermophilic and the mesophilic fungi are lower fre- 
quencies of aspartic acids and lysines in the thermophiles 
(Figure IB). The more extreme reduction of genome size 
together with the IVYWREL bias in C. thermophilum 
leads us to hypothesize that this fungus might survive at 
higher temperatures than the two Thielavias for which 
optimal growth temperatures have not been published yet. 

Analyzing the amino acid frequencies from bacterial 
(Additional file 1: Table SI) and archaeal (Additional file 1: 
Table S2) clades with thermophilic members, we observe a 
striking difference with eukaryotes; an overrepresentation 
of cysteines in C. thermophilum proteins (Figure IB); in 
total 15% of cysteines in aligned positions are unique to 
C. thermophilum. The major categorized roles of cysteines 
are in catalytic residues, disulfide bridges and metal binding 
(e.g. zinc fingers), whereby the latter two contribute to fold- 
ing and stability. Cysteines have also been shown to con- 
tribute to thermal stability in their free form, when they 
form interactions inside the core of a protein [28]. This 
unique adaptation of C. thermophilum may be another 
indication that its proteins are better adapted to high tem- 
peratures than the other two thermophilic Sordariomycetes. 
Another difference between prokaryotes and eukaryotes 
that we observe is that glycines are strongly depleted in 
C. thermophilum whereas they are enriched in C. globosum 
compared to the complete clade of Sordariomycetes. The 
exchange of alanines with glycines has been shown to 
destabilize alpha-helices, particularly in the center of the 
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helix [29]. It seems as if C. globosum has indeed used this 
strategy to make proteins less thermo-stable, and C. ther- 
mophilum has evolved in the opposite direction, lowering 
its glycine content. 

We verified the generalizability of these trends by exam- 
ining two more unpublished thermophilic fungal genomes, 
Thermomyces lanuginosus and Talaromyces thermophilus 
of the subclass Eurotiomycetidae, a different fungal clade 
that also includes Aspergillus fumigatus and Emericella 
nidulans. Compared to their mesophilic neighbours, these 
species both have a significantly higher total frequency of 
IVYWREL amino acids (P < le-7). They also show a dep- 
letion of glycines and significant enrichment in arginines 
and alanines (Figure 1C) consistent with the biases in the 
thermophilic Sordariomycetes. This shows that some of the 
trends are indeed universal between different clades of 
fungi. 

Mutational paths towards thermophily 

In contrast to thermophilic prokaryotes, the genomes of 
thermophilic fungi have very close, known mesophilic 
relatives and thus, for the first time, we can trace and 
quantify the mutational paths by which the differences in 
amino acid composition arise (Methods). We therefore 
have quantified the mutation biases between pairs of 
amino acids in all branches of the Sordariomycetes tree 
and determined how different they are in one branch 
compared to the rest of the tree (Figure 2). This is similar 
to but more specific than a previous analysis on biases 
between pairs of prokaryotic thermophiles and mesophiles 
[30]. In the prokaryote study the mesophile-thermophile 
species pairs were much more dissimilar than our 
mesophile-thermophile relatives and thus there would be 
a large effect of multiple substitutions at each site result- 
ing in 139 out of 190 amino acid pairs showing a bias. 
Likely because of the difference in evolutionary distance 
we observe a smaller number of significantly biased amino 
acid pairs (65 out of 190) in the branches leading to 
thermophilic fungi (Figure 3). We observed that mutation 
bias between several small amino acids and prolines has led 
to higher frequency of prolines already in the ancestor of 
the thermophilic Sordariomycetes (Figure 3A). Analyzing 
the amino acid frequencies from bacterial (Additional file 1: 
Table SI) and archaeal (Additional file 1: Table S2) clades 
with thermophilic members we also found that proline fre- 
quency is increasing with higher OGT (Additional file 1: 
Table S3) which is significant in bacteria but not in archaea; 
Prolines make the protein structure more rigid and less 
likely to unfold as has been shown before in case studies 
[31-33]. This strengthens the hypothesis that the ancestor 
of the thermophilic Sordariomycetes and C. globosum was 
also thermophilic. Furthermore, there are significantly more 
mutations from lysine to arginine than vice versa; the 
replacement of lysine by argnine has been shown to lead to 



Table 1 Genomes used for amino acid bias analysis 


Sordariomycetes 


Eurotiomycetidae 


Acremonium olcolophilum 


Arthroderma otae 


Chaetomium globosum 


Aspergillus aculeatus 


Chaetomium thermophilum 


Aspergillus carbonarius 


Colletotrichum higginsionum 


Aspergillus fumigatus 


Cryphonectria parasitica 


Aspergillus niger 


Fusarium oxysporum 


Aspergillus terreus 


Gibberella moniliformis 


Blastomyces dermatitidis 


Gibberella zeae 


Coccidioides immitis h538 


Glomerella graminicola 


Coccidioides immitis rs 


Hypocrea jecorina 


Emericella nidulans 


Hypocrea virens 


Histoplasma capsulatum hi 43 


Magnaporthe grisea 


Histoplasma capsulatum h88 


Nectria haematococca 


Microsporum gypseum 


Neurospora crassa 


Paracoccidioides brasiliensis 


Neurospora discreta 


Talaromyces thermophilus 


Neurospora tetrasperma 


Thermomyces lanuginosus 


Thielavia heterothallica 


Trichophyton equinum 


Thielavia terrestris 


Trichophyton rubrum 


Trichoderma atroviride 


Trichophyton tonsurans 


Verticillium dahliae 


Trichophyton verrucosum 




Uncinocarpus reesii 



less fluctuations in side groups [34]. This lysine to arginine 
bias is present in four out of five branches leading to ther- 
mophily in Sordariomycetes (Figure 3A) [30]. Other con- 
sistent biases are between aspartic and glutamic acid as 
well as between threonine and alanine, where we observe 
the opposite trend in the branch where the thermophily is 
lost, leading to C. globosum. The increased level of lysine to 
arginine mutations as hallmark of eukaryotic thermophilic 
adaptation was confirmed in two out of three branches 
in Eurotiomycetidae leading to the two monophyletic 
thermophilic species T. lanuginosus and T. thermophilus 
(Figure 3B). Moreover the strong bias of serine to alanine 
is also present in these species. Apart from these consis- 
tent biases, there are also unique, individual biases in the 
branches. As in prokaryotes it seems to be also the case in 
eukaryotes that increased thermostability can be achieved 
in many ways depending on the context. 

Considering the consistent biases, we analysed par- 
ticular residues in orthologous groups shared between 
Sordariomycetes and Eurotiomycetidae. We found that 
where the same biases exist, the overlap between posi- 
tions where e.g. arginines have been introduced instead 
of lysines is significant but small, i.e. 92 out of a total of 
7,335 positions that are changed from lysine to arginine 
in any of the thermophiles are shared between all five. 
This leads us to believe there are some positions that 
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Figure 2 Workflow for finding biases and a scoring scheme of thermophilic mutations. 



are more likely to increase stability and may even be es- 
sential; however mutations at many other positions can 
contribute independently. 

In contrast to prokaryotes where GC content has been 
found to cause a bias in amino acid frequencies of lysines 
and arginines [35,36], as previously reported in these fungi 
the GC content does not differ significantly between 
mesophiles and thermophiles [3]. There is an elevated GC 
content at the third codon position as reported by [4], 
however the frequencies of G at C and the third codon 
position do not differ between lysine and arginine. There- 
fore, in thermophilic fungi the lysine- arginine bias has 
arisen independently of the GC content. 

Scoring scheme for adaptive mutations 

Based on our observations, we developed a scoring scheme 
to give weight to individual mutations for their contribution 



to thermophily (Figure 2). We used the mutation bias 
between pairs of amino acids in the branches leading to the 
thermophilic ancestor as well as to C. thermophilum to 
arrive at these scores (see Methods). We predict that those 
positions with a high score are responsible for the thermo- 
philic adaptation of individual proteins. In this way, we can 
distinguish which thermophile specific mutations are likely 
to be adaptive and which are likely to be neutral. Since the 
thermophilic nature of proteins has been lost in C. globosum, 
we can also predict which mutations have been responsible 
for this loss. In this way we predicted 38,385 thermophilic 
adaptive mutations in 2,064 single copy proteins for which 
we could trace the ancestral amino acid sequences. 

Mutations important for thermophilic stability 

To validate some of these predictions experimentally, we 
applied them to a protein from C. thermophilum, which is 



van Noort et al. BMC Evolutionary Biology 2013, 13:7 
http://www.biomedcentral.eom/1471-2148/13/7 



Page 7 of 1 3 





Chaetomium thermophilum 



© 

.© v* 



©— © 



• Thielavia terrestris 



£ 5 

o 4 
o 3 



Number of signficant amino-acid bias pairs 



52 



B 



Q-G 



©— © 












© 




o 


e^ 6 



Thielavia heterothallica 



• Chaetomium globosum 



© 



. Talaromyces thermophilus 



■ Thermomyces lanuginosis 



© © 



©-TO© © 



Figure 3 Biases in pairwise mutation rates. Alignments were made of all single copy orthologous groups and subjected to parsimony 
reconstruction using the marker gene tree as a guide tree. The frequencies of mutations between all pairs of amino acids were analysed. 
The ratios between all pairs of amino acids were compared to the ratios in the reconstructed phylogeny between mesophiles. In principle, 
it is expected that there are as many mutations from X to Y as from Y to X. Thus a bionomial test can be used to assess a bias. However, 
there are also biases in the complete groups of Sordariomycetes and Eurotiomycetidae. Therefore the expected ratio is not set to 1:1, but to the 
actual ratio in the mesophilic neighboring species. Amino acid pairs with significant bias are connected by coloured lines, with an arrowhead 
proportional to the bias in the direction of the more frequent mutations. A histogram of bias pairs is showing how often among all nine 
branches the bias pair is observed, colours are used to connect the amino acids in A) thermophilic Sordariomycetes and B) thermophilic 
Eurotiomycetidae. 



homologous to yeast pre-ribosomal export factor Arxl 
(Associated with Ribosomal eXport complex) [37]. C. 
thermophilum Arxl (c£Arxl) is thermostable (soluble) up 
to 53°C at a concentration of 8 mg/ml, whereas the Arxl 
from the mesophilic C. globosum (cgArxl; Figure 4 A, B) 



precipitates already at 35°C (Figure 4C), corresponding to 
the OGTs of both organisms. Circular dichroism (CD) 
spectra showed that cfcArxl began to unfold in vitro at 55°C 
and reached complete unfolding at ~70°C (Figure 5). To 
test whether the predicted adaptive and neutral mutations 
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Figure 4 Amino acids responsible for thermostability. A) Alignment of Arx1 protein of the three thermophilic Sordariomycetes, C. globosum 
and N. crassa. Arrows represent positions of introduced mutations that we predict to be adaptive to thermophily (red) or neutral (blue). 
Secondary structure elements are indicated above the alignment (cylinder: a-helix; arrow: (3-strand; line: loop regions; dotted line: not solved in 
crystal structure). Violet squares represent mutations in the cgArxl mutl 1 and yellow squares represent all other differences between ctArxl and 
cgArxl. Amino acids are colored with the default color scheme of ClustalX [38]. B) Arx1 from C. thermophilum (ctArxl) is thermostable and its 
thermostability can be influenced by mutating residues predicted to be adaptive for thermophily. Recombinant wildtype ctArxl and mutant 
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bands with Aida Image Analyzer v. 4.00. The input was set to 100%. C) Recombinant cgArxl mutl 1 protein contains mutations in 1 1 residues 
predicted to have destabilized the protein of C. globosum. Thermostability analysis of recombinant cgArxl and cgArgx1mut1 1 as described in (B). 



have an effect on the thermostability of our model protein, 
we generated two mutant ctArxl proteins with either five 
predicted adaptive or five predicted neutral positions in 
ctArxl changed to the respective ancestral residues 
(Figure 4A, B). The predicted non-destabilizing (neutral 
residues) c£Arxl mutant behaved like wild-type ctArxl 
and remained soluble up to 53°C (at 8 mg/ml) and up to 



55°C (6-fold diluted; Figure 4B). However, the predicted 
destabilizing (adaptive residues) ctArxl mutant remained 
soluble only up to 49°C (8 mg/ml) and 50°C (6-fold 
diluted; Figure 4B; Additional file 1: Figure S3). This con- 
firms our prediction scheme to find mutations that in- 
crease thermostability. Furthermore, we identified eleven 
C. globosum specific mutations that we think are likely to 
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concentration is monitored by CD at 222.6 nm under a temperature 
gradient. The normalized ellipticity is plotted against the 
temperature. 



have destabilized this protein (Figure 4A, C). Introducing 
the ancestral amino acid for all these eleven mutations 
indeed increased the temperature at which the protein 
remained soluble. Thus we could turn back time and create 
a thermostable from an unstable protein. 

Structural context for adaptive mutations 

To reveal mechanistic roles for adaptive residues, we deter- 
mined the 3D structure of c£Arxl that shares the pita-bread 
fold with methionine-aminopeptidases [39] and Ebpl [40] 
(Figure 6). Expression, purification, crystallization and x-ray 
structure determination of this protein was successful, 
supporting the value of C. thermophilum as a model system 
for structural studies. The two selected adaptive proline 
mutations (P41, P104) indeed occur in loops of c£Arxl 
(Figure 6A) preventing unfolding as mentioned above 
[31-33]. Another fundamental concept in thermo- 
adaptation of proteins is an increased bulkiness of 
hydrophobic amino acids within the protein core. 
According to some models, unfolding is due to the 
transfer of water into the protein hydrophobic core that 
progressively breaks hydrophobic contacts and swells 
the protein interior [27]. Direct sequence comparison 
of c/Arxl with cgArxl indeed shows that 80% (16/20) 
of the hydrophobic amino acid exchanges lead to 
increased bulkiness. Several of these adaptive bulky 
hydrophobic residues in c/Arxl (F146, F362 and W357) 
together with VI 28 form an extended hydrophobic 
cluster, which together with adaptive C335 leads to a 
tight packing of helices a3 and a9 to the central beta- 
sheet (Figure 6 A, E). Acquired electrostatic interactions 
are found between the imino-group of W357 and D124 
((34) linking (34 to a3 more tightly (Figure 6C) and 
between adaptive R350 in (313 and E154 stabilizing (313 
with respect to helix a3. Mutation of F146, W357 and 
R350 reduces the thermostability of c/Arxl by about 



2°C to 51°C (Figure 6B, C). In addition, mutation of the 
two hydrophobic residues (F146, W357) on top of the 
five adaptive mutations leads to a further decrease of 
c/Arxl thermostability to 47°C (Figure 6B, D). Taken 
together, these examples of adaptive mutations in the 
context of the 3D structure of ctArxl illustrate how in- 
dividual residues and their interactions contribute to a 
thermophilic adaptation. 

Conclusions 

Here, we show that the principles of thermophilic adap- 
tations in fungi are similar to that in prokaryotes, with 
the notable exception of cysteines that are enriched in 
C. thermophilum and that might contribute to thermo- 
phily in several ways. The close relation of mesophilic 
species allows predicting particular mutations that are 
directly responsible for thermo- adaptation, which we 
could confirm experimentally by protein engineering. By 
solving the 3D structure of a single thermophilic protein 
(Arxl of Cth), we could identify three different types of 
adaptive mutations : (i) loop rigidity by increased proline 
frequency, (ii) increased protein core hydrophobicity, 
and (iii) increased electrostatic interactions stabilizing 
neighboring secondary structure elements. 

By now, several structures have been determined 
already based on C. thermophilum, T. terrestris and T. 
heterothallica proteins [13-17] and we and others have 
determined the thermostable nature of several other 
proteins [3,41]. This, together with our finding of thou- 
sands of mutations towards thermophily in this lineage, 
implies that the thermostability of proteins is a major con- 
tributor to the increased OGT of these organisms, in par- 
ticular in C. thermophilum. C. thermophilum is, as are T. 
terrestris and T. heterothallica promising resources for 
(thermo) stable proteins for industrial purposes as well as 
for biochemical and structural studies that rely on stable 
eukaryotic proteins and the assembly of complex molecu- 
lar machines. With experimental tools such as genetic 
transformation protocols and a number of independent 
lineages containing thermophilic eukaryotes, a rapidly 
increased understanding should lead to precise predic- 
tions which particular mutation increases thermophily 
via which mechanisms for a vast amount of important 
eukaryotic proteins. 

Methods 

Fungal orthologous groups 

Published genomes were downloaded from NCBI. Un- 
published genomes were downloaded from ftp-sites of the 
Joint Genome Institute, the BROAD institute and Genome 
Canada. Non-supervised orthologous groups (NOGs) 
were constructed for 20 Sordariomycetes and 21 Eurotio- 
mycetidae (Table 1) through identification of reciprocal 
best BLAST [42] matches and triangular linkage clustering 
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as implemented in eggNOG v2 [22]. This resulted for 
Sordariomycetes in 17,325 and for Eurotiomycetidae in 
14,979 non-supervised orthologous groups (NOGs). Out 
of 7,227 C. thermophilum proteins, we find orthologs in 
other Sordariomycetes for 7,045 of them. We found 2,064 
NOGs that contain exactly one copy from each Sordario- 
mycetes proteome (universal single copy orthologs) and 
1,436 that contain exactly one copy from each of the Euro- 
tiomycetidae. HMMs of 40 marker genes [21] were used 
to search all 20 Sordariomycetes proteomes. These were 
aligned using MUSCLE [43], pruned with GBLOCKS [44] 
and a tree was built using RAxML [45]. Trees were dis- 
played using iTOL [46]. The same procedure was applied 
for all 2,064 universal single copy orthologs. 



Amino acid overrepresentation 

Alignments of universal single copy orthologs were made 
using MUSCLE [43] with standard settings. Amino acid 
frequencies were counted in all aligned positions that do 
not contain gaps. The frequencies in C. thermophilum were 
compared against frequencies in mesophilic Sordariomy- 
cetes and Z-tests were done to obtain significant differences 
in all aligned positions. Similarly, T-tests were done to ob- 
tain significant differences between the three thermophiles 
and their ancestral nodes on the one hand and the me- 
sophiles and their ancestral nodes on the other hand. 
Significant differences are shown as stars in Figure IB and 
C. A similar analysis was done in a group of seven bacteria 
(Additional file 1: Table SI) and in a group of seven archaea 



van Noort et al. BMC Evolutionary Biology 2013, 13:7 
http://www.biomedcentral.eom/1471-2148/13/7 



Page 11 of 13 



(Additional file 1: Table S2) with large variance in the opti- 
mal growth temperatures. In this case there was not one 
species that was thermophilic, therefore, rather than a Z- 
test, correlations of the AA-frequencies with OGT were 
calculated (Additional file 1: Table S3) and t- tests were 
done between the AA-frequencies of (hyper)thermophilic 
and mesophilic species to obtain significances (Additional 
file 1: Table S3). 

Mutational paths 

Parsimonious reconstructions of ancestral states were 
made using PROTPARS from the Phylip package [47] 
with the fungi tree as user tree for all single copy NOG 
alignments. From the output file, steps at each position 
were parsed and counted only if they were unambiguous. 
The frequencies of mutations between all pairs of amino 
acids were analysed. The ratios between all pairs of 
amino acids were compared to the ratios in the whole 
reconstructed phylogeny. In principle, it is expected that 
there are as many mutations from X to Y as from Y to 
X. Thus a bionomial test can be used to assess a bias. 
However, there are also biases in the complete groups of 
Sordariomycetes and Eurotiomycetidae. Therefore the 
expected ratio is not set to 1:1, but to the actual ratio in 
the mesophilic neighboring species. 

Scoring of amino acid substitutions 

We developed a scoring scheme to give a weight to indi- 
vidual mutations for their contribution to thermophily. 
We used the mutation bias between pairs of amino acids 
to arrive at these scores. We calculate the binomial prob- 
ability of the number of mutations from amino acid X to 
Y vs Y to X, given the average ratio between X to Y and Y 
to X in the whole Sordariomycetes tree. The logarithm of 
this probability is multiplied by -1 to come to a score S 
for pair X and Y. If in a phylogenetic reconstruction, there 
is a mutation from X to Y and there is a significant bias 
from X to Y, this mutation will get the positive score S, if 
there is a significant bias from Y to X, it will get the nega- 
tive score S, otherwise the mutation is not scored. 

Purification of recombinant protein 

ORFs for cgARXl, ctARXl, ctarxl-destab and oXarxl-non- 
destab were synthesized and sequenced by Eurofins MWG 
Operon (Ebersberg, Germany) or GenScript (Piscataway, 
NJ, USA) and subcloned into pET-24a(+) vector. Proteins 
were expressed in E. coli BL21 (DE3) grown in LB-medium 
at 37°C under vigorous shaking. Cell pellets were resus- 
pended in buffer A (20 mM Hepes-NaOH pH 8.0, 350 mM 
NaCl, 10 mM KC1, 10 mM MgCl 2 , 40 mM imidazol). Cells 
were lysed by a Microfluidizer (M-110 L, Microfluidics) 
and the lysate was cleared by ultracentrifugation at 
91,000 x g for 20 minutes. Recombinant protein was 
purified by Ni-ion affinity chromatography (Ni-NTA- 



HisTrap, GE-Healthcare) via an N-terminal hexa-histidine 
tag and eluted with buffer A supplemented with 460 mM 
imidazol. c£Arxl was further purified by size exclusion 
chromatography (S200-26/60, GE-Healthcare) in a buffer 
containing 20 mM Hepes-NaOH pH 8.0, 200 mM NaCl, 
10 mM KC1 and 10 mM MgCl 2 . 

Crystallization and structure determination of ctArxl 

Crystals of c£Arxl were grown at 18°C by the sitting drop 
vapour diffusion method. Sitting drops were prepared by 
mixing 0.5 ul of fresh cfcArxl (15 mg/ml) with 0.5 ul of res- 
ervoir solution containing 0.2 M LiAcetate and 2.2 M 
(NH 4 ) 2 S0 4 . Prior X-ray analysis crystals were flash-frozen 
in liquid nitrogen after cryo-protection by transfer into a 
cryosolution containing mother liquor and 25% v/v 
glycerol. Data-collection was performed at ID23/1 at the 
European Synchrotron Radiation Facility in Grenoble 
(France). Data were processed in iMosflm and Scala [48]. 
The structure of c£Arxl was solved by molecular replace- 
ment using ccp4 implemented PHASER [49] and the crystal 
structure of Ebpl as the search model [40]. The structure 
was manually built in Coot [50] and refined with Refmac5 
[51]. Data and refinement statistics are given in Table 2. 
Figures were generated with Pymol (www.pymol.org). 

Thermostability tests 

Thermostabilites of cfcArxl and cgArxl were determined by 
testing an in vitro aggregation. For this assay, recombinant 

Table 2 Crystal data of Arxl from C. thermophilum 



Data collection 



Space group 


P2{1{1 


Unit cell parameters (A) 


192.0, 193.3, 70.9 


0 


90, 90, 90 


Resolution (A) 


86.4 - 2.3 (2.42 - 2.3) 


R a 

n Merge 


0.127 (0.53) 


Unique reflections 


118103 (17031) 


Completeness (%) 


100 (100) 


Multiplicity 


5.9 (5.9) 


<l/ol> 


13.7 (4.0) 


Refinement 




Number of used reflections 


112170 


Resolution limits (A) 


57.0 -2.3 


R factor b (%) 


20.2 


Free R factor c (%) 


24.2 


Rmsd bond lengths (A) 


0.016 


Rmsd bond angles (°) 


1.535 



a* Rmerge = ^h^jl'hj - <'h>|/^h^> where l hj is the intensity of the jth observation 
of the unique reflection h. 

b. R factor = X h | | F oh | - |F ch ||/I h |F oh |, where F oh and F ch are the observed and 
calculated structure factor amplitudes for reflection h. 

c. The free R factor is equivalent to the R factor, but is calculated using 5% of 
reflections excluded from the maximum-likelihood refinement stages. 
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c£Arxl and cgArxl were purified from E. coli and incubated 
at the indicated temperatures (see Figure 3B, D, Additional 
file 1: Figure S7B) for one hour in buffer 2 (50 mM 
Tris-HCl pH 7.5, 200 mM NaCl, 10 mM KC1, 10 mM 
MgCl 2 , 5% (v/v) glycerol, 0.01% (v/v) MTG). Following cen- 
trifugation at 20,000 rpm at 4°C for 30 minutes, an equiva- 
lent sample of the supernatant and the pellet fraction was 
separated by SDS-polyacrylamide gel electrophoresis 
(PAGE; NuPAGE 4-12% Bis-Tris Gel, Invitrogen). Proteins 
were visualized with Coomassie (Brilliant Blue G - colloidal 
Concentrate, Sigma- Aldrich). 

Circular dichroism 

For measuring unfolding of c£Arxl the circular dichroism 
(CD) was recorded at different temperatures. Dichroism 
spectra from c£Arxl were recorded at a protein concentra- 
tion of -0.1 mg/ml on a Jasco J-810 spectropolarimeter in 
a 0.1 cm path length cuvette at 20°C. Proteins were 
exchanged into 10 mM potassium phosphate, pH 7.5. 
Four scans were measured from 250 to 200 nm in 1 nm 
increments with a 1 s averaging time and a bandwidth of 

1 nm. The scans were averaged, and the buffer spectrum 
was subtracted. Mean residue ellipticity 0 MRW was calcu- 
lated according to Equation 1, where 0 is the raw signal 
in millidegrees, 1 is path length in cm, n is the number of 
amino acids, and c is the concentration of the protein in 
moles per liter. 

@mrw = — — r~ (1) 

10 x 1 x n x c 

Thermal denaturation 

Thermal unfolding transitions of c£Arxl were followed by 
circular dichroism at 222.6 nm with 1 nm bandwidth in 

2 mm cells and a heating rate of 1°C per minute using a 
Jasco J-810 spectropolarimeter in 10 mM potassium phos- 
phate, pH 7.5, at a protein concentration of -0.1 mg/ml. 

Additional file 



ctArxl wild-type recombinant proteins were affinity-purified and 
incubated at the indicated temperatures for 1 hour, separated into 
supernatant (S) and pellet (P) fractions by centrifugation and subjected to 
SDS-PAGE and Coomassie stain in comparison to the input (I). PS: protein 
standard. 
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